Introduction to the logistic regression model for Used-Available Data

General introduction to the biological rationale of this lab was given in the Lab 2 introduction. In Lab 3, we extend the analysis of habitat use by wolves in two wolf packs from 2001-2005 in Banff National Park to analysis of habitat selection using univariate logistic model to estimate a resource selection function (RSF) as a function of the availability of resources.

Boyce and McDonald (1999) and Manly et al. (2002) defined a resource selection function (RSF) as any function that is proportional to the probability of use of a resource unit to covariates that allow biologists to gain insight into the underlying factors affecting the probability of use. Thus, H.S.I. models, expert opinion models, models based solely on use, occupancy models, and selection models are all potentially classified as RSF models is their function relates to the probability of use. The basic form of a resource selection function model is to relate the covariates that predict used locations against covariates that predict un-used or available locations. As we shall see in future weeks, there is a CRITICAL distinction in interpretation of the RSF dependent on which design is employed; the use-availability or the used-unused. For now, however, we will simply consider the best way to compare used locations against available locations.

logit (y) = exp(B0+B1X1+BnXn + e)/ (1 + exp(B0+B1X1+BnXn + e))

We will review the equation in class/lab.

0.1 Preliminaries: getting started, loading packages, setting working directory

## Loading required package: ggplot2
## Loading required package: foreign
## Loading required package: lattice
## Loading required package: psych
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
## Loading required package: effects
## Loading required package: plyr
## ggplot2 foreign lattice   psych effects    plyr 
##    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE

0.2 Preliminaries: Merging the wolf availabiltiy sample from the KDE’s from last week.

Note last week we merged the wolf USED data, but not the availability data. Here is the code to merge the availabiltiy data. Note I did this for you, skip to 1.0 below.

#rdavail <- as.data.frame(cov.availRD)
#str(rdavail)
#rdavail$pack <- c("Red Deer")
#str(rdused)

## repeat for Bow Valley pack
#bvavail <- as.data.frame(cov.availBV)
#str(bvavail)
#bvavail$pack <- c("Bow Valley")
#str(bvavail)

## merge the two availability samples together
#wolfavail <- rbind(rdavail, bvavail)
#str(wolfavail)

## and for next week, lets add a new column for a 1=used 0 = avail
#wolfavail$used <- 0

#write.table(wolfavail, file = "/Users/mark.hebblewhite/Dropbox/WILD 562/Spring2017/lab2/new/wolfavail.csv", row.names=FALSE, na="", col.names=TRUE, sep=",")

Objective 1.0 Merging wolf USED and wolf AVAIL datasets0

# first read them back into active memory. 

wolfused <-read.csv("wolfused.csv", header = TRUE)
wolfavail <-read.csv("wolfavail.csv", header = TRUE)
str(wolfavail)
## 'data.frame':    2000 obs. of  10 variables:
##  $ deer_w2             : int  1 NA 1 3 NA NA 3 1 NA 3 ...
##  $ moose_w2            : int  1 NA 1 3 NA NA 3 1 NA 3 ...
##  $ elk_w2              : int  1 NA 1 4 NA NA 5 1 NA 3 ...
##  $ sheep_w2            : int  1 NA 1 6 NA NA 5 1 NA 5 ...
##  $ goat_w2             : int  4 NA 4 5 NA NA 5 4 NA 5 ...
##  $ wolf_w2             : int  1 NA 1 4 NA NA 4 1 NA 3 ...
##  $ Elevation2          : num  2694 1991 2620 2322 2245 ...
##  $ DistFromHumanAccess2: num  3304 1218 4146 2011 2157 ...
##  $ pack                : Factor w/ 2 levels "Bow Valley","Red Deer": 2 2 2 2 2 2 2 2 2 2 ...
##  $ used                : int  0 0 0 0 0 0 0 0 0 0 ...
wolfkde <- rbind(wolfused, wolfavail)
str(wolfkde)
## 'data.frame':    2413 obs. of  10 variables:
##  $ deer_w2             : int  4 4 4 4 NA 1 4 4 4 3 ...
##  $ moose_w2            : int  5 4 5 5 NA 1 5 5 5 5 ...
##  $ elk_w2              : int  5 4 5 5 NA 1 5 5 5 4 ...
##  $ sheep_w2            : int  3 1 4 4 NA 1 4 4 3 1 ...
##  $ goat_w2             : int  3 3 1 1 NA 4 1 1 1 3 ...
##  $ wolf_w2             : int  5 4 5 5 NA 1 5 5 5 4 ...
##  $ Elevation2          : num  1766 1789 1765 1743 1987 ...
##  $ DistFromHumanAccess2: num  427.4 360.5 283.7 167.4 27.9 ...
##  $ pack                : Factor w/ 2 levels "Bow Valley","Red Deer": 2 2 2 2 2 2 2 2 2 2 ...
##  $ used                : int  1 1 1 1 1 1 1 1 1 1 ...
table(wolfkde$used, wolfkde$pack)
##    
##     Bow Valley Red Deer
##   0       1000     1000
##   1        320       93
table(wolfkde$used, wolfkde$deer_w2)
##    
##       1   3   4   5   6   7
##   0 488 347 739 168  17   6
##   1   2  36 151 122  85   4
## next we will create a new variable called usedFactor and graphically compare USED and AVAIL locations for prey
wolfkde$usedFactor <- factor(wolfkde$used, labels=c('0','1'))
str(wolfkde)
## 'data.frame':    2413 obs. of  11 variables:
##  $ deer_w2             : int  4 4 4 4 NA 1 4 4 4 3 ...
##  $ moose_w2            : int  5 4 5 5 NA 1 5 5 5 5 ...
##  $ elk_w2              : int  5 4 5 5 NA 1 5 5 5 4 ...
##  $ sheep_w2            : int  3 1 4 4 NA 1 4 4 3 1 ...
##  $ goat_w2             : int  3 3 1 1 NA 4 1 1 1 3 ...
##  $ wolf_w2             : int  5 4 5 5 NA 1 5 5 5 4 ...
##  $ Elevation2          : num  1766 1789 1765 1743 1987 ...
##  $ DistFromHumanAccess2: num  427.4 360.5 283.7 167.4 27.9 ...
##  $ pack                : Factor w/ 2 levels "Bow Valley","Red Deer": 2 2 2 2 2 2 2 2 2 2 ...
##  $ used                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ usedFactor          : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...

Objective 1.1. Graphical data exploration for all wolves

par(mfrow = c(2,3))
boxplot(deer_w2~usedFactor, data=wolfkde, main = "Deer Used-Avail", xlab="usedFactor", ylab="deer")
boxplot(elk_w2~usedFactor, main = "Elk Used-Avail", ylab="elk_w2", xlab="usedFactor", data=wolfkde)
boxplot(moose_w2~usedFactor, main = "Moose Used-Avail", ylab="moose_w2", xlab="usedFactor", data=wolfkde)
boxplot(goat_w2~usedFactor, main = "Goat Used-Avail", ylab="goat_w2", xlab="usedFactor", data=wolfkde)
boxplot(sheep_w2~usedFactor, main = "Sheep Used-Avail", ylab="sheep_w2", xlab="usedFactor", data=wolfkde)

#### Class discussion: how do we interpret the comparison of used and available for ungulate prey for wolves in Banff?

Now lets do for Elevation and Distance from Human Access2

par(mfrow = c(1,2))
boxplot(Elevation2~usedFactor, data=wolfkde, main = "Elevation Used-Avail", xlab="usedFactor", ylab="deer")
boxplot(DistFromHumanAccess2~usedFactor, data=wolfkde, main = "Human Access Used-Avail", ylab="elk_w2", xlab="usedFactor")

#### Class discussion: similarly, what do we think? Do wolves like humans? Do wolves like elevation?

Note that the above analyses are for both packs combined, but your analyses last week confirmed that the red deer wolf pack, for example, dwells at higher elevations than the bow valley pack. We should probably consider the wolf packs separately.

## subset for Bow Valley Pack
bvkde<- subset(wolfkde, subset=pack =="Bow Valley")
par(mfrow = c(2,3))
boxplot(deer_w2~usedFactor, data=bvkde, main = "Deer Used-Avail", xlab="usedFactor", ylab="deer")
boxplot(elk_w2~usedFactor, data=bvkde, main = "Elk Used-Avail", ylab="elk_w2", xlab="usedFactor")
boxplot(moose_w2~usedFactor, data=bvkde, main = "Moose Used-Avail", ylab="moose_w2", xlab="usedFactor")
boxplot(goat_w2~usedFactor, data=bvkde, main = "Goat Used-Avail", ylab="goat_w2", xlab="usedFactor")
boxplot(sheep_w2~usedFactor, data=bvkde, main = "Sheep Used-Avail", ylab="sheep_w2", xlab="usedFactor")
## Now lets do for Elevation and Distance from Human Access2
par(mfrow = c(1,2))

boxplot(Elevation2~usedFactor, data=bvkde, main = "Elevation Used-Avail", xlab="usedFactor", ylab="deer")
boxplot(DistFromHumanAccess2~usedFactor, data=bvkde, main = "Human Access Used-Avail", ylab="elk_w2", xlab="usedFactor")

Class discussion: how does the Bow Valley pack compare to all wolves in BNP?

## subset for Red Deer Wolf
rdkde <- subset(wolfkde, subset=pack=="Red Deer")
table(rdkde$used, rdkde$pack)
##    
##     Bow Valley Red Deer
##   0          0     1000
##   1          0       93
par(mfrow = c(2,3))
boxplot(deer_w2~usedFactor, data=rdkde, main = "Deer Used-Avail", xlab="usedFactor", ylab="deer")
boxplot(elk_w2~usedFactor, data=rdkde, main = "Elk Used-Avail", ylab="elk_w2", xlab="usedFactor")
boxplot(moose_w2~usedFactor, data=rdkde, main = "Moose Used-Avail", ylab="moose_w2", xlab="usedFactor")
boxplot(goat_w2~usedFactor, data=rdkde, main = "Goat Used-Avail", ylab="goat_w2", xlab="usedFactor")
boxplot(sheep_w2~usedFactor, data=rdkde, main = "Sheep Used-Avail", ylab="sheep_w2", xlab="usedFactor")
## Now lets do for Elevation and Distance from Human Access2
par(mfrow = c(1,2))

boxplot(Elevation2~usedFactor, data=rdkde, main = "Elevation Used-Avail", xlab="usedFactor", ylab="deer")
boxplot(DistFromHumanAccess2~usedFactor, data=rdkde, main = "Human Access Used-Avail", ylab="elk_w2", xlab="usedFactor")

#### Class discussion: and what about the red deer pack?

It would be nice if there was a way to graph both by pack and used?

## Can make more complex box plots
par(mfrow = c(1,1))
boxplot(Elevation2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Elevation Used-Avail")

boxplot(DistFromHumanAccess2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Human Access Used-Avail")

boxplot(deer_w2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Elevation Used-Avail")

boxplot(moose_w2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Elevation Used-Avail")

boxplot(elk_w2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Elevation Used-Avail")

boxplot(goat_w2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Elevation Used-Avail")

boxplot(sheep_w2~used+pack, data = wolfkde, main = "Bow Valley vs. Red Deer Elevation Used-Avail")

## using lattice package
bwplot(sheep_w2+ goat_w2 + elk_w2+moose_w2+ deer_w2~as.factor(usedFactor)|pack, data = wolfkde, layout = c(2,5), pch = "|", outer = TRUE)

Objective 1.2 - Numerical Summary Statistics

## numerical summary statistics
aggregate(wolfkde, by=list(wolfkde$pack, wolfkde$used), FUN=mean, na.rm=TRUE)
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
##      Group.1 Group.2  deer_w2 moose_w2   elk_w2 sheep_w2  goat_w2  wolf_w2
## 1 Bow Valley       0 3.483000 3.641000 3.350000 2.972000 2.963000 3.657000
## 2   Red Deer       0 2.596078 2.816993 2.839216 2.628758 3.222222 2.921569
## 3 Bow Valley       1 4.890625 4.137500 4.675000 2.340625 1.540625 4.878125
## 4   Red Deer       1 3.712500 4.125000 4.087500 3.175000 2.575000 4.175000
##   Elevation2 DistFromHumanAccess2 pack used usedFactor
## 1   1887.752             927.1657   NA    0         NA
## 2   2164.276            1366.7303   NA    0         NA
## 3   1452.303             256.9808   NA    1         NA
## 4   1877.511             485.3091   NA    1         NA
# psych package
describeBy(wolfkde, wolfkde$pack)
## $`Bow Valley`
##                      vars    n    mean     sd  median trimmed    mad  min
## deer_w2                 1 1320    3.82   1.37    4.00    3.92   1.48    1
## moose_w2                2 1320    3.76   1.35    4.00    3.88   1.48    1
## elk_w2                  3 1320    3.67   1.30    4.00    3.76   1.48    1
## sheep_w2                4 1320    2.82   1.56    3.00    2.74   2.97    1
## goat_w2                 5 1320    2.62   1.70    3.00    2.42   2.97    1
## wolf_w2                 6 1320    3.95   1.24    4.00    4.10   1.48    1
## Elevation2              7 1320 1782.19 362.15 1656.25 1739.92 347.41 1401
## DistFromHumanAccess2    8 1283  762.10 782.60  473.94  630.37 531.63    0
## pack*                   9 1320    1.00   0.00    1.00    1.00   0.00    1
## used                   10 1320    0.24   0.43    0.00    0.18   0.00    0
## usedFactor*            11 1320    1.24   0.43    1.00    1.18   0.00    1
##                          max   range  skew kurtosis    se
## deer_w2                 7.00    6.00 -0.68     0.21  0.04
## moose_w2                7.00    6.00 -0.65     0.03  0.04
## elk_w2                  7.00    6.00 -0.55     0.24  0.04
## sheep_w2                7.00    6.00  0.20    -1.04  0.04
## goat_w2                 7.00    6.00  0.55    -0.87  0.05
## wolf_w2                 7.00    6.00 -0.86     0.87  0.03
## Elevation2           2844.61 1443.61  0.77    -0.53  9.97
## DistFromHumanAccess2 4117.53 4117.53  1.51     2.06 21.85
## pack*                   1.00    0.00   NaN      NaN  0.00
## used                    1.00    1.00  1.20    -0.56  0.01
## usedFactor*             2.00    1.00  1.20    -0.56  0.01
## 
## $`Red Deer`
##                      vars    n    mean      sd  median trimmed     mad
## deer_w2                 1  845    2.70    1.39    3.00    2.70    1.48
## moose_w2                2  845    2.94    1.57    3.00    2.91    2.97
## elk_w2                  3  845    2.96    1.46    3.00    2.92    1.48
## sheep_w2                4  845    2.68    1.68    3.00    2.54    2.97
## goat_w2                 5  845    3.16    1.59    3.00    3.12    1.48
## wolf_w2                 6  845    3.04    1.43    4.00    3.03    1.48
## Elevation2              7 1093 2139.88  333.50 2131.26 2126.32  367.77
## DistFromHumanAccess2    8 1093 1291.73 1340.70  895.99 1057.11 1030.01
## pack*                   9 1093    2.00    0.00    2.00    2.00    0.00
## used                   10 1093    0.09    0.28    0.00    0.00    0.00
## usedFactor*            11 1093    1.09    0.28    1.00    1.00    0.00
##                          min     max   range  skew kurtosis    se
## deer_w2                 1.00    7.00    6.00 -0.17    -1.49  0.05
## moose_w2                1.00    7.00    6.00 -0.06    -1.40  0.05
## elk_w2                  1.00    7.00    6.00 -0.15    -1.10  0.05
## sheep_w2                1.00    7.00    6.00  0.36    -1.25  0.06
## goat_w2                 1.00    7.00    6.00 -0.13    -1.10  0.05
## wolf_w2                 1.00    7.00    6.00 -0.33    -1.05  0.05
## Elevation2           1493.69 3256.15 1762.47  0.31    -0.54 10.09
## DistFromHumanAccess2    0.00 6784.45 6784.45  1.64     2.78 40.55
## pack*                   2.00    2.00    0.00   NaN      NaN  0.00
## used                    0.00    1.00    1.00  2.97     6.83  0.01
## usedFactor*             1.00    2.00    1.00  2.97     6.83  0.01
## 
## attr(,"call")
## by.data.frame(data = x, INDICES = group, FUN = describe, type = type)
tapply(wolfkde$pack, wolfkde$used, summary)
## $`0`
## Bow Valley   Red Deer 
##       1000       1000 
## 
## $`1`
## Bow Valley   Red Deer 
##        320         93
## introduction to the plyr package
## http://stat545.com/block013_plyr-ddply.html
## and
## http://www.cookbook-r.com/Manipulating_data/Summarizing_data/
## and Hadley Wickhams original paper here: plyr paper: The split-apply-combine strategy for data analysis, Hadley Wickham, Journal of Statistical Software, vol. 40, no. 1, pp. 1–29, 2011.   
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(elk_w2, na.rm=TRUE), sd=sd(elk_w2, na.rm=TRUE))
##         pack used     mean        sd
## 1 Bow Valley    0 3.350000 1.2740185
## 2 Bow Valley    1 4.675000 0.7882364
## 3   Red Deer    0 2.839216 1.4647800
## 4   Red Deer    1 4.087500 0.8596989
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(deer_w2, na.rm=TRUE), sd=sd(deer_w2, na.rm=TRUE))
##         pack used     mean        sd
## 1 Bow Valley    0 3.483000 1.3211572
## 2 Bow Valley    1 4.890625 0.9048318
## 3   Red Deer    0 2.596078 1.4059467
## 4   Red Deer    1 3.712500 0.6202031
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(goat_w2, na.rm=TRUE), sd=sd(goat_w2, na.rm=TRUE))
##         pack used     mean       sd
## 1 Bow Valley    0 2.963000 1.708962
## 2 Bow Valley    1 1.540625 1.096406
## 3   Red Deer    0 3.222222 1.590310
## 4   Red Deer    1 2.575000 1.524276
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(moose_w2, na.rm=TRUE), sd=sd(moose_w2, na.rm=TRUE))
##         pack used     mean        sd
## 1 Bow Valley    0 3.641000 1.3798634
## 2 Bow Valley    1 4.137500 1.1955377
## 3   Red Deer    0 2.816993 1.5754909
## 4   Red Deer    1 4.125000 0.8912372
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(sheep_w2, na.rm=TRUE), sd=sd(sheep_w2, na.rm=TRUE))
##         pack used     mean       sd
## 1 Bow Valley    0 2.972000 1.552942
## 2 Bow Valley    1 2.340625 1.464172
## 3   Red Deer    0 2.628758 1.707148
## 4   Red Deer    1 3.175000 1.280575
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(Elevation2, na.rm=TRUE), sd=sd(Elevation2, na.rm=TRUE))
##         pack used     mean        sd
## 1 Bow Valley    0 1887.752 354.15484
## 2 Bow Valley    1 1452.303  73.57596
## 3   Red Deer    0 2164.276 333.75452
## 4   Red Deer    1 1877.511 185.78956
ddply(wolfkde, c("pack", "used"), summarize, mean=mean(DistFromHumanAccess2, na.rm=TRUE), sd=sd(DistFromHumanAccess2, na.rm=TRUE))
##         pack used      mean        sd
## 1 Bow Valley    0  927.1657  829.0498
## 2 Bow Valley    1  256.9808  212.6737
## 3   Red Deer    0 1366.7303 1368.4931
## 4   Red Deer    1  485.3091  530.0458
## Challenging Question : how could we do this all in one step in ddply??
## Getting to become an expert at ddply is a HUGE advantage. 

Objective 1.3: Graphical exploration with ggplot2

lets bring in the shapefile from last week wolfyht.csv just as a dataframe (i.e., the .dbf version of the shapefile, which I’ve already converted to .csv [note you can import .dbf’s directly with the foreign package])

The goal here is just to continue to build our data exploration skills

wolfyht <-read.csv("wolfyht.csv", header = TRUE)
str(wolfyht)
## 'data.frame':    413 obs. of  13 variables:
##  $ ID        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ ID2       : int  944 945 946 947 948 949 950 951 952 953 ...
##  $ NAME      : int  44 44 44 44 44 44 44 44 44 44 ...
##  $ NoCollared: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Pack      : Factor w/ 2 levels "Bow valley","Red Deer": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DATE_     : Factor w/ 275 levels "1/1/03","1/10/02",..: 81 102 108 107 106 110 114 116 118 122 ...
##  $ Season    : Factor w/ 1 level "winter": 1 1 1 1 1 1 1 1 1 1 ...
##  $ EASTING   : int  588900 567700 587500 588100 558477 586000 578950 588150 588155 584800 ...
##  $ NORTHING  : int  5670800 5686600 5672700 5671500 5697212 5674300 5679200 5671250 5672074 5675700 ...
##  $ HOW_OBTAIN: int  2 2 2 2 2 2 2 2 2 2 ...
##  $ CONFIDENCE: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PackID    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ elevati   : int  1402 1498 1402 1402 1587 1402 1433 1402 1405 1404 ...
# download the ggplot2 pdf manual and make sure you have the R Graphics Cookbook open too
# chapter 6 in R Graphics cookbook
# Data exploration
ggplot(wolfyht, aes(x=EASTING, y = NORTHING)) + geom_point() + stat_density2d() + facet_grid(Pack ~ ., scales="free")

# lets subset the data
ggplot(wolfyht, aes(x=EASTING, y = NORTHING)) + geom_point() + stat_density2d(aes(alpha=..density..), geom="tile", contour=FALSE) + facet_grid(Pack ~ .)

ggplot(wolfyht, aes(x=EASTING, y = NORTHING)) + geom_point() + stat_density2d(aes(fill=..density..), geom="raster", contour=FALSE)

#### Basically, what we have made here is a visualyl pleasing version of the Kernel density estimator we made last week in adehabitatHR!

Objective 2. Univariate Logistic Regression

Now, we are going to conduct univariate logistic regression following advice from (Hosmer and Lemeshow 2000) who advocate such an approach for understanding the effect of each covariate on the probability of use. There are at least three different ways to conduct a logistic regression in most stats packages, including R, but following from lab 1 and 2, we are going to use GLM- GENERALIZED LINEAR MODEL to increase familiarity with this format.

Objective 2.1 first simulating data to understand the model

x = c(-50:50) ## just creating a uniform vector from -50 to 50. 
y = rbinom(length(x), 1, plogis(1+0.07*x) )
## unpack this line by ?rbinom
## and ? plogis

plot( y ~ x)
abline(lm((y~x)))

wrong = lm(y~x)
summary(wrong)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.80513 -0.22644 -0.03017  0.28061  0.82714 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.633663   0.036862  17.190  < 2e-16 ***
## x           0.010716   0.001264   8.476 2.27e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3705 on 99 degrees of freedom
## Multiple R-squared:  0.4205, Adjusted R-squared:  0.4147 
## F-statistic: 71.84 on 1 and 99 DF,  p-value: 2.271e-13

What have we learned by incorrectly analyzing binary (1, 0) data by forcing a linear model (linear regression) through the origin? There has to be a better way…. Logistic Regression!

res = glm( y~x, family=binomial(link="logit"))
summary(res)
## 
## Call:
## glm(formula = y ~ x, family = binomial(link = "logit"))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1259  -0.5884   0.2284   0.6109   2.0707  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.01895    0.31700   3.214  0.00131 ** 
## x            0.07066    0.01358   5.201 1.98e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 132.709  on 100  degrees of freedom
## Residual deviance:  81.465  on  99  degrees of freedom
## AIC: 85.465
## 
## Number of Fisher Scoring iterations: 5
yLogit=predict(res)
plot( yLogit ~ x )

yhat=predict(res, type="response")
plot( y ~ x)
lines(yhat~x)

#### These plots show that the slope of the logistic function is linear in the logit transformation (plot yLogit ~ x), and non-linear in real probability. The logit link function logit (y) = exp(B0+B1X1+BnXn + e)/ exp(B0+B1X1+BnXn + e) handily keeps the real probability bounded between 0 and 1 always.

Objective 2.1 Univariate logistic regression with glm

#?glm
elev <- glm(used ~ Elevation2, family=binomial(logit), data=wolfkde)
summary(elev)
## 
## Call:
## glm(formula = used ~ Elevation2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.31679  -0.52729  -0.19841  -0.06223   2.98396  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.4178477  0.5100664   16.50   <2e-16 ***
## Elevation2  -0.0057787  0.0003168  -18.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2208.9  on 2412  degrees of freedom
## Residual deviance: 1519.9  on 2411  degrees of freedom
## AIC: 1523.9
## 
## Number of Fisher Scoring iterations: 6
str(elev)
## List of 30
##  $ coefficients     : Named num [1:2] 8.41785 -0.00578
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "Elevation2"
##  $ residuals        : Named num [1:2413] 6.98 7.81 6.94 6.23 22.47 ...
##   ..- attr(*, "names")= chr [1:2413] "1" "2" "3" "4" ...
##  $ fitted.values    : Named num [1:2413] 0.1433 0.128 0.144 0.1606 0.0445 ...
##   ..- attr(*, "names")= chr [1:2413] "1" "2" "3" "4" ...
##  $ effects          : Named num [1:2413] 12.73 18.24 2.34 2.19 4.54 ...
##   ..- attr(*, "names")= chr [1:2413] "(Intercept)" "Elevation2" "" "" ...
##  $ R                : num [1:2, 1:2] -15.7 0 -25019.3 -3156.6
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "(Intercept)" "Elevation2"
##   .. ..$ : chr [1:2] "(Intercept)" "Elevation2"
##  $ rank             : int 2
##  $ qr               :List of 5
##   ..$ qr   : num [1:2413, 1:2] -15.6626 0.0213 0.0224 0.0234 0.0132 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2413] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "Elevation2"
##   ..$ rank : int 2
##   ..$ qraux: num [1:2] 1.02 1.02
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-11
##   ..- attr(*, "class")= chr "qr"
##  $ family           :List of 12
##   ..$ family    : chr "binomial"
##   ..$ link      : chr "logit"
##   ..$ linkfun   :function (mu)  
##   ..$ linkinv   :function (eta)  
##   ..$ variance  :function (mu)  
##   ..$ dev.resids:function (y, mu, wt)  
##   ..$ aic       :function (y, n, mu, wt, dev)  
##   ..$ mu.eta    :function (eta)  
##   ..$ initialize:  expression({     if (NCOL(y) == 1) {         if (is.factor(y))              y <- y != levels(y)[1L]         n <- rep.int(1, nobs)         y[weights == 0] <- 0         if (any(y < 0 | y > 1))              stop("y values must be 0 <= y <= 1")         mustart <- (weights * y + 0.5)/(weights + 1)         m <- weights * y         if (any(abs(m - round(m)) > 0.001))              warning("non-integer #successes in a binomial glm!")     }     else if (NCOL(y) == 2) {         if (any(abs(y - round(y)) > 0.001))              warning("non-integer counts in a binomial glm!")         n <- y[, 1] + y[, 2]         y <- ifelse(n == 0, 0, y[, 1]/n)         weights <- weights * n         mustart <- (n * y + 0.5)/(n + 1)     }     else stop("for the 'binomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
##   ..$ validmu   :function (mu)  
##   ..$ valideta  :function (eta)  
##   ..$ simulate  :function (object, nsim)  
##   ..- attr(*, "class")= chr "family"
##  $ linear.predictors: Named num [1:2413] -1.79 -1.92 -1.78 -1.65 -3.07 ...
##   ..- attr(*, "names")= chr [1:2413] "1" "2" "3" "4" ...
##  $ deviance         : num 1520
##  $ aic              : num 1524
##  $ null.deviance    : num 2209
##  $ iter             : int 6
##  $ weights          : Named num [1:2413] 0.1228 0.1116 0.1233 0.1348 0.0425 ...
##   ..- attr(*, "names")= chr [1:2413] "1" "2" "3" "4" ...
##  $ prior.weights    : Named num [1:2413] 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "names")= chr [1:2413] "1" "2" "3" "4" ...
##  $ df.residual      : int 2411
##  $ df.null          : int 2412
##  $ y                : Named num [1:2413] 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "names")= chr [1:2413] "1" "2" "3" "4" ...
##  $ converged        : logi TRUE
##  $ boundary         : logi FALSE
##  $ model            :'data.frame':   2413 obs. of  2 variables:
##   ..$ used      : int [1:2413] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ Elevation2: num [1:2413] 1766 1789 1765 1743 1987 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language used ~ Elevation2
##   .. .. ..- attr(*, "variables")= language list(used, Elevation2)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "used" "Elevation2"
##   .. .. .. .. ..$ : chr "Elevation2"
##   .. .. ..- attr(*, "term.labels")= chr "Elevation2"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(used, Elevation2)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "used" "Elevation2"
##  $ call             : language glm(formula = used ~ Elevation2, family = binomial(logit), data = wolfkde)
##  $ formula          :Class 'formula'  language used ~ Elevation2
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  $ terms            :Classes 'terms', 'formula'  language used ~ Elevation2
##   .. ..- attr(*, "variables")= language list(used, Elevation2)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "used" "Elevation2"
##   .. .. .. ..$ : chr "Elevation2"
##   .. ..- attr(*, "term.labels")= chr "Elevation2"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(used, Elevation2)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "used" "Elevation2"
##  $ data             :'data.frame':   2413 obs. of  11 variables:
##   ..$ deer_w2             : int [1:2413] 4 4 4 4 NA 1 4 4 4 3 ...
##   ..$ moose_w2            : int [1:2413] 5 4 5 5 NA 1 5 5 5 5 ...
##   ..$ elk_w2              : int [1:2413] 5 4 5 5 NA 1 5 5 5 4 ...
##   ..$ sheep_w2            : int [1:2413] 3 1 4 4 NA 1 4 4 3 1 ...
##   ..$ goat_w2             : int [1:2413] 3 3 1 1 NA 4 1 1 1 3 ...
##   ..$ wolf_w2             : int [1:2413] 5 4 5 5 NA 1 5 5 5 4 ...
##   ..$ Elevation2          : num [1:2413] 1766 1789 1765 1743 1987 ...
##   ..$ DistFromHumanAccess2: num [1:2413] 427.4 360.5 283.7 167.4 27.9 ...
##   ..$ pack                : Factor w/ 2 levels "Bow Valley","Red Deer": 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ used                : int [1:2413] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ usedFactor          : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ offset           : NULL
##  $ control          :List of 3
##   ..$ epsilon: num 1e-08
##   ..$ maxit  : num 25
##   ..$ trace  : logi FALSE
##  $ method           : chr "glm.fit"
##  $ contrasts        : NULL
##  $ xlevels          : Named list()
##  - attr(*, "class")= chr [1:2] "glm" "lm"
## exploring univarite logistic regression
## how to obtain 95% confidence intervals? Where are they in the output?
## CI's using profile log-likelihood's
confint(elev)
## Waiting for profiling to be done...
##                    2.5 %       97.5 %
## (Intercept)  7.444639636  9.445646765
## Elevation2  -0.006420221 -0.005177356
## CI's using standard errors
confint.default(elev)
##                    2.5 %       97.5 %
## (Intercept)  7.418135862  9.417559508
## Elevation2  -0.006399641 -0.005157807
## odds ratio's
exp(coefficients(elev))
##  (Intercept)   Elevation2 
## 4527.1491093    0.9942379
## how to obtain 95% CI's on odds ratio's
exp(cbind(OR=coef(elev), confint(elev)))
## Waiting for profiling to be done...
##                       OR        2.5 %       97.5 %
## (Intercept) 4527.1491093 1710.6687170 12652.963879
## Elevation2     0.9942379    0.9936003     0.994836
## rescaling beta coefficients and odds ratio's 
## note that for elevation, the change in odds is for every 1 meter change in elevation. Perhaps this is not useful.

wolfkde$elev100 <- wolfkde$Elevation2 / 100
elev100 <- glm(used ~ elev100, family=binomial(logit), data=wolfkde)
summary(elev100)
## 
## Call:
## glm(formula = used ~ elev100, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.31679  -0.52729  -0.19841  -0.06223   2.98396  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.41785    0.51007   16.50   <2e-16 ***
## elev100     -0.57787    0.03168  -18.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2208.9  on 2412  degrees of freedom
## Residual deviance: 1519.9  on 2411  degrees of freedom
## AIC: 1523.9
## 
## Number of Fisher Scoring iterations: 6
exp(coef(elev100))
##  (Intercept)      elev100 
## 4527.1491093    0.5610909
## therefore the interpretation of the odds ratio is scale dependent
## Unpacking logistic regression - interpreting coefficients
## excercise in EXCEL (argh!)
## repeat EXCEL excercise in R. 
elevBnp = 0:3000 ## creates a new vector elevBnp with ranges from 0 - 3000 in it. 
str(elevBnp)
##  int [1:3001] 0 1 2 3 4 5 6 7 8 9 ...
elevPred = predict(elev, newdata=data.frame(Elevation2=elevBnp), type = "response") ## uses the predict function to predict Y values given the model object elev
hist(elevPred)

plot(elevBnp, elevPred, type="l", ylim = c(0,1.0), ylab= "Pr(Used)")

# but were there elevations from 0 - 1300m in Banff?
plot(wolfkde$Elevation2, wolfkde$used)
lines(elevBnp, elevPred, type="l", ylab= "Pr(Used)", add = TRUE)
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "add" is not a
## graphical parameter

#### How do wolves respond to elevation? Can we make predictions about wolf use of elevations < 1000 m based on our data in Banff? What would that be?

## next human use
distHuman <- glm(used ~ DistFromHumanAccess2, family=binomial(logit), data=wolfkde)
summary(distHuman)
## 
## Call:
## glm(formula = used ~ DistFromHumanAccess2, family = binomial(logit), 
##     data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0206  -0.7409  -0.3580  -0.0697   3.2595  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -0.3807708  0.0813594   -4.68 2.87e-06 ***
## DistFromHumanAccess2 -0.0021069  0.0001602  -13.15  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2182.4  on 2375  degrees of freedom
## Residual deviance: 1811.1  on 2374  degrees of freedom
##   (37 observations deleted due to missingness)
## AIC: 1815.1
## 
## Number of Fisher Scoring iterations: 7
hist(wolfkde$DistFromHumanAccess2)

disthumanBnp = 0:7000
disthumanPred = predict(distHuman, newdata=data.frame(DistFromHumanAccess2=disthumanBnp), type="response")
hist(disthumanPred)

plot(disthumanBnp, disthumanPred, type="l", ylab= "Pr(Used)")

plot(wolfkde$DistFromHumanAccess2, wolfkde$used)
lines(disthumanBnp, disthumanPred, type="l", ylab= "Pr(Used)", add = TRUE)
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "add" is not a
## graphical parameter

#### Similarly, how do wolves respond to human activity?

NExt for the ungulate H.S.I. models

# now lets do all at once for ungulate HSI models
sheep <- glm(used ~ sheep_w2, family=binomial(logit), data=wolfkde)
summary(sheep)
## 
## Call:
## glm(formula = used ~ sheep_w2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7018  -0.7018  -0.6271  -0.5589   2.0746  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.15032    0.10646  -10.80   <2e-16 ***
## sheep_w2    -0.12544    0.03543   -3.54    4e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2072.0  on 2164  degrees of freedom
## Residual deviance: 2059.2  on 2163  degrees of freedom
##   (248 observations deleted due to missingness)
## AIC: 2063.2
## 
## Number of Fisher Scoring iterations: 4
habvalues = 0:7
deer <- glm(used ~ deer_w2, family=binomial(logit), data=wolfkde)
summary(deer)
## 
## Call:
## glm(formula = used ~ deer_w2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1022  -0.6423  -0.3672  -0.1136   3.1772  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.22995    0.32181  -19.36   <2e-16 ***
## deer_w2      1.18905    0.07295   16.30   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2072.0  on 2164  degrees of freedom
## Residual deviance: 1604.4  on 2163  degrees of freedom
##   (248 observations deleted due to missingness)
## AIC: 1608.4
## 
## Number of Fisher Scoring iterations: 6
elk <- glm(used ~ elk_w2, family=binomial(logit), data=wolfkde)
summary(elk)
## 
## Call:
## glm(formula = used ~ elk_w2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0530  -0.6626  -0.3911  -0.1288   3.0969  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.91471    0.30437  -19.43   <2e-16 ***
## elk_w2       1.12751    0.07012   16.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2072.0  on 2164  degrees of freedom
## Residual deviance: 1649.2  on 2163  degrees of freedom
##   (248 observations deleted due to missingness)
## AIC: 1653.2
## 
## Number of Fisher Scoring iterations: 6
moose <- glm(used ~ moose_w2, family=binomial(logit), data=wolfkde)
summary(moose)
## 
## Call:
## glm(formula = used ~ moose_w2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1645  -0.6737  -0.5498  -0.3599   2.3534  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.15017    0.18924 -16.647   <2e-16 ***
## moose_w2     0.44567    0.04508   9.887   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2072.0  on 2164  degrees of freedom
## Residual deviance: 1956.4  on 2163  degrees of freedom
##   (248 observations deleted due to missingness)
## AIC: 1960.4
## 
## Number of Fisher Scoring iterations: 5
goat <- glm(used ~ goat_w2, family=binomial(logit), data=wolfkde)
summary(goat)
## 
## Call:
## glm(formula = used ~ goat_w2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8988  -0.8988  -0.4074  -0.2306   2.9023  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.11474    0.10226  -1.122    0.262    
## goat_w2     -0.58315    0.04367 -13.353   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2072.0  on 2164  degrees of freedom
## Residual deviance: 1842.9  on 2163  degrees of freedom
##   (248 observations deleted due to missingness)
## AIC: 1846.9
## 
## Number of Fisher Scoring iterations: 5

And similarly, lets examine the predicted probabilities

habvalues = 0:7 ## making a vector of hsi values
sheeppred = predict(sheep, newdata = data.frame(sheep_w2 = habvalues), type = "response")
goatpred = predict(goat, newdata = data.frame(goat_w2 = habvalues), type = "response")
moosepred = predict(moose, newdata = data.frame(moose_w2 = habvalues), type = "response")
elkpred = predict(elk, newdata = data.frame(elk_w2 = habvalues), type = "response")
deerpred = predict(deer, newdata = data.frame(deer_w2 = habvalues), type = "response")
sheeppred = predict(sheep, newdata = data.frame(sheep_w2 = habvalues), type = "response")

plot(habvalues, elkpred, type ="l", ylim = c(0,1.0), ylab = "Pr(Used)", col = "green")
lines(habvalues, goatpred, col = "blue")
lines(habvalues, moosepred, col = "red") 
lines(habvalues, sheeppred, col = "black") 
lines(habvalues, deerpred, col = "gray") 
legend(x="topleft", legend= c("Elk","Mountain Goat", "Moose", "Sheep", "Deer"), lty=1, col = c("green", "blue", "red", "black", "gray"), bty = "n")

Now lets go back to our elevation object and save predictions from the Elevation model in the wolfkde dataframe. We can use this to plot predicted values for most situations as well.

## back to elevation
elev <- glm(used ~ Elevation2, family=binomial(logit), data=wolfkde)
summary(elev)
## 
## Call:
## glm(formula = used ~ Elevation2, family = binomial(logit), data = wolfkde)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.31679  -0.52729  -0.19841  -0.06223   2.98396  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.4178477  0.5100664   16.50   <2e-16 ***
## Elevation2  -0.0057787  0.0003168  -18.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2208.9  on 2412  degrees of freedom
## Residual deviance: 1519.9  on 2411  degrees of freedom
## AIC: 1523.9
## 
## Number of Fisher Scoring iterations: 6
wolfkde$fitted.Elev <- fitted(elev)
head(wolfkde)
##   deer_w2 moose_w2 elk_w2 sheep_w2 goat_w2 wolf_w2 Elevation2
## 1       4        5      5        3       3       5   1766.146
## 2       4        4      4        1       3       4   1788.780
## 3       4        5      5        4       1       5   1765.100
## 4       4        5      5        4       1       5   1742.913
## 5      NA       NA     NA       NA      NA      NA   1987.394
## 6       1        1      1        1       4       1   1778.360
##   DistFromHumanAccess2     pack used usedFactor  elev100 fitted.Elev
## 1            427.39618 Red Deer    1          1 17.66146  0.14329070
## 2            360.50430 Red Deer    1          1 17.88780  0.12797116
## 3            283.66480 Red Deer    1          1 17.65100  0.14403416
## 4            167.41344 Red Deer    1          1 17.42913  0.16057354
## 5             27.90951 Red Deer    1          1 19.87394  0.04449974
## 6            622.62573 Red Deer    1          1 17.78360  0.13484260
hist(wolfkde$fitted.Elev)

plot(wolfkde$fitted.Elev, wolfkde$Elevation2)

Objective 3.0 Improving graphical predictions using ggplot2

First lets explore the distribution of predicted values from the elevation model

# ggplot 2 explore basic histogram functio
ggplot(wolfkde, aes(x=wolfkde$fitted.Elev)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# lets explore faceting
ggplot(wolfkde, aes(x=wolfkde$fitted.Elev)) + geom_histogram(binwidth=0.05, fill="gray70", colour="black") + facet_grid(used ~ .)

ggplot(wolfkde, aes(x=wolfkde$fitted.Elev)) + geom_histogram(binwidth=0.05, fill="gray70", colour="black") + facet_grid(used ~ ., scales = "free")

ggplot(wolfkde, aes(x=wolfkde$fitted.Elev, fill=usedFactor)) + geom_histogram(binwidth=0.05, position="identity", alpha=0.7) + xlab("Predicted Probability of Wolf Use") + theme(axis.title.x=element_text(size=16))

# lets redo this graph using faceting by pack
ggplot(wolfkde, aes(x=wolfkde$fitted.Elev, y=..density.., fill=usedFactor)) + geom_histogram(binwidth=0.05, position="identity", alpha=0.7) + xlab("Predicted Probability of Wolf Use") + theme(axis.title.x=element_text(size=16)) + facet_grid(pack ~ ., scales="free")

#### this gives us a firm sense that things differ by pack

# Now lets explore fitting functions to the distributions
ggplot(wolfkde, aes(x=wolfkde$fitted.Elev)) + geom_density()

ggplot(wolfkde, aes(x=wolfkde$fitted.Elev), fill=usedFactor) + geom_density(alpha=0.5) + xlim(0,1)+xlab("Predicted Probability of Wolf Use") + theme(axis.title.x=element_text(size=16)) 

# kernel lines
ggplot(wolfkde, aes(x=wolfkde$fitted.Elev, y=..density.., fill=usedFactor)) + geom_histogram(binwidth=0.05) + geom_density(alpha = 0.5) + facet_grid(pack ~ .)

ggplot2 has a great handy function called + stat_smooth for plotting logistic regression.

# Exploring Predictions as a function of covariates
#___________
# this fits a univariate glm as a function of elevation and predicts
ggplot(wolfkde, aes(x=Elevation2, y=used)) + geom_point() + stat_smooth(method="glm", method.args = list(family="binomial"))

#
ggplot(wolfkde, aes(x=DistFromHumanAccess2, y=used)) + geom_point() + stat_smooth(method="glm", method.args = list(family="binomial"))
## Warning: Removed 37 rows containing non-finite values (stat_smooth).
## Warning: Removed 37 rows containing missing values (geom_point).

ggplot(wolfkde, aes(x=elk_w2, y=used)) + geom_point() + stat_smooth(method="glm", method.args = list(family="binomial"))
## Warning: Removed 248 rows containing non-finite values (stat_smooth).
## Warning: Removed 248 rows containing missing values (geom_point).

## but whats up with the dots? - lets jitter and see
ggplot(wolfkde, aes(x=elk_w2, y=used)) + geom_point() +geom_jitter(aes(colour = used), width=0.25, height = 0.05) + stat_smooth(method="glm", method.args = list(family="binomial"))
## Warning: Removed 248 rows containing non-finite values (stat_smooth).

## Warning: Removed 248 rows containing missing values (geom_point).

## Warning: Removed 248 rows containing missing values (geom_point).

#### What the jittering shows is the nice, clear separation between 0’s and 1’s. MOST 1’s are > 4 H.S.I. ranking scores, and similarly, most 0’s are < 5 HSI score.

## lets redo elevation jittered by used
ggplot(wolfkde, aes(x=Elevation2, y=used)) + geom_point() +geom_jitter(aes(colour = used), width=0.25, height = 0.05)+ stat_smooth(method="glm", method.args = list(family="binomial"))

# Splitting by wolf pack and change the confidence interval to 99th
ggplot(wolfkde, aes(x=Elevation2, y=used, colour=pack)) + geom_point() + stat_smooth(method="glm", method.args = list(family="binomial"), level=0.99)

ggplot(wolfkde, aes(x=Elevation2, y=used, colour=pack)) + geom_point() + geom_jitter(width=0.25, height = 0.05) +stat_smooth(method="glm", method.args = list(family="binomial"), level=0.90)

#### This is the first analysis that shows us the two wolf packs respond very differently to elevation. I wonder if they respond differently to Moose, or Sheep, say?

ggplot(wolfkde, aes(x=moose_w2, y=used, colour=pack)) + stat_smooth(method="glm", method.args = list(family="binomial"))
## Warning: Removed 248 rows containing non-finite values (stat_smooth).

ggplot(wolfkde, aes(x=sheep_w2, y=used, colour=pack)) + stat_smooth(method="glm", method.args = list(family="binomial"))
## Warning: Removed 248 rows containing non-finite values (stat_smooth).

####This provides ample evidence that perhaps we should not be pooling over both wolf packs and paves the way for your homework this week.

you can also facet by pack, and put each pack in its own panel.

# versus faceting by wolf pack
ggplot(wolfkde, aes(x=Elevation2, y=used)) + geom_point() + stat_smooth(method="glm", method.args = list(family="binomial"), level=0.90) + facet_grid(pack~.)

### and finally, this last plot again illustrates the futility of fitting a linear model (lm) to binary data compared against the predictions from the elev model.

# this function plots predictions from the previously fitted best model
ggplot(wolfkde, aes(x=Elevation2, y=fitted.Elev)) + geom_point() + stat_smooth(method=lm) + ylim(0, 0.8)
## Warning: Removed 40 rows containing missing values (geom_smooth).

lastly, I will illustrate our first use of the effects package which is especially nice later for quickly visualizing interactions

plot(effect("Elevation2", elev), grid=TRUE) 

plot(effect("deer_w2", deer), grid=TRUE)

## but note the scales are stretched to linearize the response

saving graphics

it is a HUGE time saver to use R to make publication quality figures

and https://www.r-bloggers.com/high-resolution-figures-in-r/

#Printing PDFs from R
pdf("wolf_elev.pdf", width=4, height=4)
print(ggplot(wolfkde, aes(x=Elevation2, y=used)) + geom_point(colour="gray") + stat_smooth(method="glm", method.args = list(family="binomial")) + xlab("Prey H.S.I") + ylab("Predicted Probability of Wolf Use"))
dev.off()
## quartz_off_screen 
##                 2
# then go and look in the active directory for wolf_elev.pdf
#or
ggsave("elev_wolf2.pdf", width=4, height=4)
#

remember that the fig3 has been saved in your working directory.

# Now - how to make a figure of all 5 prey species predictions
## code figured out from Latham et al. (2013) in Ecography
fig3<-ggplot(wolfkde, aes(x=elk_w2, y=used)) + geom_smooth(data = wolfkde, aes(x=elk_w2, y=used, col="Elk"),method="glm", method.args = list(family="binomial")) + geom_smooth(data = wolfkde, aes(x=deer_w2, y=used, col="Deer"),method="glm", method.args = list(family="binomial"))+ geom_smooth(data = wolfkde, aes(x=moose_w2, y=used, col="Moose"),method="glm", method.args = list(family="binomial"))+ geom_smooth(data = wolfkde, aes(x=sheep_w2, y=used, col="Sheep"),method="glm", method.args = list(family="binomial"))+ geom_smooth(data = wolfkde, aes(x=goat_w2, y=used, col="Goat"),method="glm", method.args = list(family="binomial")) + xlab("Relative probability of summer prey resource selection") + ylab("Relative probability of summer wolf resource selection") + theme(axis.title.y=element_text(size=18), axis.text.y=element_text(size=18)) + theme(axis.title.x=element_text(size=18), axis.text.x=element_text(size=18))+ labs(fill="Prey Species")
## lets save this
pdf("fig3.pdf", width=4, height=4)
fig3
## Warning: Removed 248 rows containing non-finite values (stat_smooth).

## Warning: Removed 248 rows containing non-finite values (stat_smooth).

## Warning: Removed 248 rows containing non-finite values (stat_smooth).

## Warning: Removed 248 rows containing non-finite values (stat_smooth).

## Warning: Removed 248 rows containing non-finite values (stat_smooth).
dev.off()
## quartz_off_screen 
##                 2

remember that the figure is saved in your working directory.

Homework 3: LAB 3 – ASSIGNMENT

DUE IN CLASS TUESDAY FEBRUARY 21, 2015

Answer the following questions in your lab report following the exact guidelines of lab 2, feeling free to use TABLES and FIGURES properly labeled and in JWM format.

1) Summarize the patterns of univariate selection you have examined for ALL covariates pooled across wolf packs and then separated by wolf packs (new). Answer the following question.

a. What are the most important variables (strongest) in their univariate effect on wolf resource selection? What are the effects?

b. How do the patterns of selection observed this week relate to the analysis of USE from lab 2?

c. How do the effects of covariates differ between wolf packs?

2) Repeat (on your own) the steps required to merge the USED wolf data with the AVAILABLE wolf pack data from the 95% MCP Home ranges we developed in LAB 2. Note – this may mean you need to quickly re-run Lab 2 to save the MCP home ranges, and sample 1000 random availability points within these MCP’s.

3) With the merged USED/AVAIL dataset for 95% MCP home ranges, repeat the analyses in this lab – what are the main differences (if any) between the MCP and Kernel home range estimates of Availability??